home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
The PC-SIG Library 10
/
The PC-Sig Library - Shareware for the IBM PC and Compatibles (PC-SIG)(Tenth Edition Disks 1-2804)(1991).iso
/
PC_SIGCD
/
07
/
3
/
DISK0731.ZIP
/
WORDS
< prev
next >
Wrap
Text File
|
1985-03-31
|
1KB
|
35 lines
WORDS
Words have a precise meaning in LOCATE. A
word is a sequence of letters with no
intervening punctuation or digits. In the text
"March 14, 1980", the only word is march. Case
is not significant and "MARCH", "march", and
"MaRcH" are all the same word. In the text
"Pascal86", the only word is "pascal". INDEX
does not distinguish between proper nouns and
regular words.
Words of less than three characters are
ignored. Words of more than seven characters
are not distinguishable if the first seven
match. "Democracy" and "democratic" are
considered to be the same word.
Some words occur frequently in all
documents and consequently do not serve to
distinguish the contents of files. If indexed,
these words would use up disk space, and would
also contribute to the "false hit problem".
INDEX uses a common word list to ignore certain
words.
LOCATE distinguishes a total of 4093 words.
Most people use about three thousand words in
"common usage" and perhaps 30,000 words in
academic or technical usage. Proper nouns can
extend these numbers significantly. The
selection of the number 4093 is a compromise
between speed, storage, and the number of "false
hits".